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Abstract — A method of channel polarization, proposed by 
Arikan, allows us to construct efficient capacity-achieving chan- 
nel codes. In the original work, binary input discrete memoryless 
channels are considered. A special case of g-ary channel polar- 
ization is considered by §asoglu, Telatar, and Arikan. In this 
paper, we consider more general channel polarization on q-ary 
channels. We further show explicit constructions using Reed- 
Solomon codes, on which asymptotically fast channel polarization 
is induced. 

I. Introduction 

Channel polarization, proposed by Arikan, is a method 
of constructing capacity achieving codes with low encoding 
and decoding complexities (TJ. Channel polarization can also 
be used to construct lossy source codes which achieve rate- 
distortion trade-off with low encoding and decoding com- 
plexities |2). Arikan and Telatar derived the rate of channel 
polarization J3). In |4j, a more detailed rate of channel polar- 
ization which includes coding rate is derived. In [1|, channel 
polarization is based on a 2 x 2 matrix. Korada, §a§oglu, 
and Urbanke considered generalized polarization phenomenon 
which is based on an I x I matrix and derived the rate of 
the generalized channel polarization 0. In (6), a special case 
of channel polarization on q-ary channels is considered. In 
this paper, we consider channel polarization on g-ary channels 
which is based on arbitrary mappings. 

II. Preliminaries 

Let Wg^ 1 and u{ denote a row vector (uq, . . . , it^-i) and its 
sub vector (ttj, . . . , Uj). Let T c denote the complement of a set 
T, and \T\ denotes cardinality of T. Let X and y be an input 
alphabet and an output alphabet, respectively. In this paper, 
we assume that X is finite and that y is at most countable. 
A discrete memoryless channel (DMC) W is defined as a 
conditional probability distribution W(y \ x) over y where 
x e X and y e y. We write W : X -> y to mean a DMC W 
with an input alphabet X and an output alphabet y. Let q be 
the cardinality of X. In this paper, the base of the logarithm 
is q unless otherwise stated. 

Definition 1: The symmetric capacity of q-ary input chan- 
nel W : X — > y is defined as 



I{W) 
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-W(y | x) log 



W(y | x) 



Note that I{W) e [0,1]. 

Definition 2: Let V x := {y £ y | W{y | x) > W(y | 
x'),Vx' £ X, x' 7^ x}. The error probability of the maximum- 
likelihood estimation of the input x on the basis of the output 
y of the channel W is defined as 

p e (w) := ^ E E w (y i 

xeX y£T>% 

Definition 3: The Bhattacharyya parameter of W is defined 
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as 



Z(W) := 



9(9-1) 
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where the Bhattacharyya parameter of W between x and x 1 
is defined as 



Z x ,x>{W) := J2 VW(y | x)W{y \ x'). 

The symmetric capacity IiW), the error probability P e (W), 
and the Bhattacharyya parameter Z(W) are interrelated as in 
the following lemmas. 
Lemma 4: 

Pe(W) < (q-l)Z(W). 

Lemma 5: ||6] 

I{W) > log ■ 



q 



1 + (q - l)Z{W) 
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I(W) < log(g/2) + (log2)v/l-Z(^) 2 
I(W) < 2(q - l)(logc)v/l - Z(W) 2 . 

Definition 6: The maximum and the minimum of the Bhat- 
tacharyya parameters between two symbols are defined as 

Z m ax(W) :— max (W) 

Z min {W) := min Z X , X ,{W). 

x£X,x'£X 

Let a : X — > X be a permutation. Let a 1 denote the ith power 
of a. The average Bhattacharyya parameter of W between 
x and x' with respect to a is defined as the average of 



Z Z , Z -(W) over the subset {(z,z r ) = (a l (x),a l (x')) e A" 2 | 
i = 0, 1, . . . , q\ — 1} as 
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III. Channel polarization on q-ary DMC induced 

BY NON-LINEAR KERNEL 

We consider a channel transform using a one-to-one onto 
mapping g : X 1 — » which is called a kernel. In the 
previous works (TJ, (5), it is assumed that g = 2 and that 
g is linear. In [6|, X is arbitrary but g is restricted. In this 
paper, X and g are arbitrary. 

Definition 7: Let : <Y -4 y be a DMC. Let : -> 
y% : A" -> y X A* -1 , and : X ^ y e be defined 

as DMCs with transition probabilities 
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Definition 8: Let {i?i}i=o,i,... be independent random vari- 
ables such that Bi = k with probability 4, for each fc = 
0,...,*-l. 

In probabilistic channel transform W — > W^ Bi \ expectation 
of the symmetric capacity is invariant due to the chain rule for 
mutual information. The following lemma is a consequence of 
the martingale convergence theorem. 

Lemma 9: There exists a random variable P^ such that 
I(W^ B °' '"f- 8 ")) converges to 1^ almost surely as n — > oo. 

When q — 2 and g(u\) = (uq + ux,u\), Ankan showed 
that P(/oo S {0,1}) = 1 Q]. This result is called chan- 
nel polarization phenomenon since subchannels polarize to 
noiseless channels and pure noise channels. Korada, §a§oglu, 
and Urbanke consider channel polarization phenomenon when 
q = 2 and g is linear |5|. 

From Lemma [5] I(W) is close to and 1 when Z(W) 
is close to 1 and 0, respectively. Hence, it would be suf- 
ficient to prove channel polarization if one can show that 
Z(W i - Bl) "^ Bn ' ) ) converges to 2 M e {0,1} almost surely. 
Here we instead show a weaker version of the above property 
in the following lemma and its corollary. 

Lemma 10: Let {y n }n&N be a sequence of discrete sets. 
Let {W n : X — s> Xi}neN be a sequence of q-ary DMCs. Let 
a and r be permutations on X. Let 

W' n ( Vl ,y 2 | x) = W n ( Vl | a(x))W n (y 2 | t(x)) 

where W n : X -> y n , W' n : X -> ^ Assume 
linw+oo I(W^) - I(W„) = 0. Then, for any 8 e (0,1/2), 
there exists m such that Z™ , (W n ) £ (8,1 — 8) for any 
x E X, x' <E X and n > m. 



Proof: Let Z, Y\ and y 2 be random variables which take 
values on X, y n and y n , respectively, and jointly obey the 
distribution 

P n (Z = z,Y 1 = y 1 ,Y 2 = y 2 ) 

= ~W n (yi I a(z))W n (y 2 \ t(z)). 

q 

Since I(W;,) = I(Z;Y X ,Y 2 ) and I(W n ) = I(Z;Yi), 

j(Z;y 1) y 2 )-j(z ; y 1 ) = j(z ; y 2 |y 1 ) 

tends to by the assumption. Since the mutual information is 
lower bounded by the cut-off rate, one obtains 

I(Z; Y 2 I y) > - log £ p„(y - Vl ) 



V^n(y 2 = y 2 1 z = z,yi = yi) 



- log P «( y l = = * I *i = Vi) 

x P„(Z = a; I Yi = yx)Z T{z)iT{x) (W n ) 

-log g n (yi,z,x)Z r(CT -i (2)):T((T -i (:c)) (W r „) 



where 



q n (yi,z,x) := P„(Yi = J/i) 

x P„(Z = a- 1 ^) I y - Vl )P n (Z = a- 1 (x) I y - y x ). 
Since 

E ffnG/i,^) = P n (y = Vi) 



yi&y 

x 



(Vp^ = v- 1 ^) I y = yi)Pn(z = a-^(x) 1 y = yi ))' 



> ( £ p n(Yi=yi) 

x = ^(z) I y = yi)p„(z = <t-i(x) 1 y = m) 
= \z z . x (w n ) 2 

it holds 

I(Z;Y 2 I y) > - log 

— E Z Z:X (W n ) 2 (l - Z T ( .-i( z )) iT ( .-i( a .))(W n )) 
The convergence of /(Z; y 2 | Yi) to implies that 

Z z ,x(W n ) 2 (l - ^ T (<j- 1 (2)),T(o'- 1 (x))(W / n)) 

converges to for any (z,x) G X 2 . It consequently im- 
plies that for any 8 £ (0,1/2), there exists m such that 



z l%> 1 ( w n) 5) for any x £ X, x' £ X and n > m. 

■ 

Using Lemma [10] one can obtain a partial result of the 
channel polarization as follows. 

Corollary 11: Assume that there exists Uq~ 2 £ X l ~ Y , 
£ {0,1, . . . ,£ — l} 2 and permutations a and r on X 
such that i-th element of g^ 1 ^ 1 ) and j-th element of 
are a(u^x) and t(u^_i), respectively, and such that for any 
v l ~ 2 ^ u^ 2 £ X l ~ Y there exists m £ {0, 1, ...,£- 1} and 
a permutation fj, on X such that rn-th element of g(vQ ) is 
/j,(ve-i). Then, for almost every sequence b\, . . . , b n , . . . of 
0, ... ,1—1, and for any 8 £ (0, 1/2), there exists m such that 

-^~ 1 ( w/(f,l) "' (M ) i (8, 1 - <*) for any a; G A?, x' £ X and 
n > m. 

Proof: Since /(W^- 81 )"^- 8 ")) converges to 1^ almost 
surely, |J(W( B 0-(B»)(*-1)) - I(W( fl 0-(B»))| has to con- 
verge to almost surely. Let U^ 1 and 1^ denote random 
variables ranging over X 1 and y l , and obeying the distribution 

- = ,n = Uv^\/ -\ui- 2 1 u^). 

Then, it holds 

i(W^) = i(Y i -\ut 2 ;U i . 1 ) 
= I(Y*- 1 ;U^ 1 \ut 2 ) 

1-2 " 

From the assumption, J(ljf -1 ; E^-i | U^ 2 = u[~ 2 ) > 
I(W) for all u e Q - 2 £ X e -\ Hence, J(V( B 0---(B>>) ) - 
/(14/( Bl ) - ( B ")) has to converge to almost surely. By ap- 
plying Lemma [TOl one obtains the result. ■ 
When q = 2, since Z(W) = Zq^(W), this corollary immedi- 
ately implies the channel polarization phenomenon, although it 
is not sufficient for general q ^ 2. Note that in this derivation 
one does not use extra conditions e.g., symmetricity of DMC, 
linearity of a kernel. 

If a kernel is linear, a more detailed condition is obtained. 

Definition 12: Assume (X, +, •) be a commutative ring. A 
kernel g : X 1 — ► X e is said to be linear if g(ax + bz) = 
ag(x) + bg(z) for all a £ X, b £ X, x £ X e , and z £ X 1 . 

If g is linear, g can be represented by a square matrix 
G such that g^ 1 ) = u^G. Let U$~\ X^ 1 and Y^ 1 
denote random variables taking values on X 1 , X 1 and y l , 
respectively, and obeying distribution 

P{ut x =ui-\x^ = x i-\Yt x =yl x ) 

= Ye w *(y°- 1 \4- 1 G)n4- 1 v = 4- 1 } 

where V denotes an £ x £ full-rank upper triangle matrix. 
There exists a one-to-one correspondence between X'q and 
Uq for all i £ {0, ...,£ — 1}. Hence, statistical properties 
of W^' are invariant under an operation G — > VG. Further, 
a permutation of columns of G does not change statistical 
properties of either. Since any full-rank matrix can be 
decomposed to the form VLP where V, L, and P are upper 



triangle, lower triangle, and permutation matrices, without loss 
of generality we assume that G is a lower triangle matrix and 
that Gkk = 1 where k £ {0, ...,£ — 1} is the largest number 
such that the number of non-zero elements in fc-th row of G 
is greater than 1, and where GV, denotes (i, j) element of G. 

Theorem 13: Assume that X is a field of prime cardinality, 
and that linear kernel G is not diagonal. Then, P(/oo £ 
{0,1}) = 1. 

Proof: It holds 

w^ivt 1 ^- 1 1 « fc ) = i n ( j2 i *)) 

x J] W(y 3 | XJ ) J] W(y 3 | G kj u k + Xj ) 
jes ieSi 

where S Q := {j £ {0, ...,£- 1} \ G kj = 0}, 5i := 
{j £ {0, ...,1 — 1} | Gkj 7^ 0}, and X j is j-th element of 
(uQ~ 1 ,0 k ~ 1 )G where O^ -1 is all-zero vector of length £ - k. 
Let m £ {0, . . . ,k — 1} be such that Gk m ^ 0. Since each 
Mg _1 occurs with positive probability l/q k , we can apply 
Lemma [TOl with a{x) = x and t(x) = Gk m x + z for arbitrary 
z £ X. Hence, for sufficiently large n, Z^ X ,(W^-< B ^) is 
close to or 1 almost surely where fi(x) — G\ m x + z for 
all i € {0, . . . , q — 2} and z £ X. Since q is a prime, when 
Ho(z) = z + x' - x for x ^ x', Z^ X ,{W^-< B ^) is close 
to or 1 if and only if Z(W < - Bl >- l( - B ^) is close to or 1, 
respectively. ■ 
This result is a simple generalization of the special case 
considered by §a§oglu, Telatar, and Ankan (6). For a prime 
power q and a finite field X, we show a sufficient condition 
for channel polarization in the following corollary. 

Corollary 14: Assume that A" is a field and that a linear 
kernel G is not diagonal. If there exists j £ {0, . . . , k— 1} such 
that Gkj is a primitive element. Then, P(Ioc £ {0, 1}) = 1. 

Proof: By applying Lemma [lOl one sees that for almost 
every sequence b\, ...,&„,.. . of 0, ... ,£ — 1, and for any 
S £ (0,1/2), there exists m such that Z^^W^"^) £ 
(S, 1 — S) for any x £ X, x' £ X and n > m where 
a(x) = GkjX + z for arbitrary z £ X. It suffices to show 
that for any x £ X and x' £ X, x ^ x' Z x ^{W {Bl ^- {B ^) 
is close to 1 if and only if Z{W^ Bl) "' i - Bn) ) is close to 1. When 
Z X ,AW {B ^- {B ^) is close to 1, Z . Gk3(x ,_ x) (W^y-< B ^) 
is close to 1. Hence, Z ,G* fe .(x>-x)(W ( - Bl ' > "' ( - B "' > ) is c l° se t0 1 
for any i £ {0, . . . , q — 2}. Since Gkj is a primitive element, 
Zo,x(W^ Bl ^ "' (Bn) ) is close to 1 for any x £ X. It completes 
the proof. ■ 
In (7), it is shown that the channel polarization phenomenon 
occurs by using a random kernel in which Gkj is chosen 
uniformly from nonzero elements. Corollary [14] says that 
a deterministic primitive element Gkj is sufficient for the 
channel polarization phenomenon. 

IV. Speed of polarization 

Ankan and Telatar showed the speed of polarization 131 . 
Korada, §a§oglu, and Urbanke generalized it to any binary 
linear kernels Q. 



Proposition 15: Let {X n G (0, l)} n£ N be a random process 
satisfying the following properties. 

1) X n converges to Xoo almost surely. 

2) X n+ i < cX® n where {D n > 1}„gn are independent 
and identically distributed random variables, and c is a 
constant. 

Then, 

lim P(X n < 2- 2 "") = PLtoo = 0) 

n— ^oo 

for j3 < E[log 2 £>i\ where E[-] denotes an expectation. Sim- 
ilarly, let {X n S (0, l)} nS N be a random process satisfying 
the following properties. 

1) X n converges to X^ almost surely. 

2) X n+ i > cX®" where {D n > 1}„gn are independent 
and identically distributed random variables, and c is a 
constant. 



Then, 



lim P(X n <2~ 2 ) = 



for (3 > E[log 2 b{\. 

Note that the above proposition can straightforwardly be 
extended to include the rate dependence J4J. 

In order to apply Proposition [HI to ^ max (W (Bl) and 
Zmm{W^ Bl ' 1 as X n and X n , respectively, the second 

conditions have to be proven. In the argument of [5|, partial 
distance of a kernel corresponds to the random variables D n 
and D n in Proposition IT~5l 

Definition 16: Partial distance of a kernel g : X 1 — > X e is 
defined as 



mm i d(j(uj ,x,v l+1 ), g(u l ,x',w i+1 )) 



where d(a, b) denotes the Hamming distance between a £ X 
and b S X 1 . 

We also use the following quantities. 



D®„:= max £>« 



D [l) := min D K , 
mm t x.x' 

x£X,x £X 



x^x 



When g is linear, D xx ,(u l Q 1 ) does not depend on x, x' or 
itg -1 , in which case we will use the notation Dw instead of 

d( £a<~ 1 )- 

From Lemma |2T| in the appendix, the following lemma is 
obtained. 

Lemma 17: For i G {0, ...,£ — 1}, 



Corollary 18: For i G {0, 1}, 



D, 1 ', 



(i) 
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From Proposition Q3] and Corollary [18] the following theorem 
is obtained. 

Theorem 19: Assume P(7 00 (W) G {0, 1}) = 1. It holds 

lim P(Z(W (Bl) - (B " ) ) < 2-*"") = J(W) 

n— >oo 

for^<(l/^)E i log^l. 
When ^(W) > 0, 

lim P(Z(W^( Bl )-( s ")) < 2-^") = 

for > (l/£) £\ log, PE. 

When <? is a linear kernel represented by a square matrix 
G, {!/£) log £ is called the exponent of G 0. 

Example 20: Assume that X is a field and that a e ^ is a 
primitive element. For a non-zero element 7 G A\ let 



G RS (q) = 



1 



1 



a (q-2)(q-2) 
a (9-2)(<?-3) 



9-2 



a (<?-3)( 9 -2) 
a (<Z-3)( 9 -3) 

1 



1 1 

ofl- 2 1 
a«- 3 1 



a 
1 



1 

1 7 



Since G RS (2) 



. Gns(q) can be regarded as a gen- 



eralization of Arikan's original matrix. The relation between 
binary polar codes and binary Reed-Muller codes [1| also 
holds for q-ary polar codes using Grs(q) and g-ary Reed- 
Muller codes. From Theorem Q~3] the channel polarization 
phenomenon occurs on Grs(<?) for any 7^0 when q is a 
prime. When 7 is a primitive element, from Corollary [14] 
the channel polarization phenomenon occurs on Grs((?) for 
any prime power q. We call Grs((7) the Reed-Solomon kernel 
since the submatrix which consists of i-th row to (q — 1)- 
th row of Grs(<z) is a generator matrix of a generalized 
Reed-Solomon code, which is a maximum distance separable 
code i.e., I)W = i + 1. Hence, the exponent of G^s(q) is 
j J2i log^ + 1) where £ = q. Since 



1 1-1 



1) > 



1 



i=0 



log„ xdx = 1 



1 



«og e ^ 



the exponent of the Reed-Solomon kernel tends to 1 as £ = q 
tends to infinity. When q = 2 2 , the exponent of the Reed- 
Solomon kernel is log e 24/(4 log c 4) « 0.57312. In Arikan's 
original work, the exponent of the 2x2 matrix is 0.5 Q. 
In (5], Korada, §a§oglu, and Urbanke showed that by using 
large kernels, the exponent can be improved, and found a 
matrix of size 16 whose exponent is about 0.51828. The above- 
mentioned Reed-Solomon kernel with q = 2 2 is reasonably 
small and simple but has a larger exponent than binary linear 



kernels of small size. This demonstrates the usefulness of 
considering q-ary rather than binary channels. For q-ary DMC 
where q is not a prime, it can be decomposed to subchannels 
of input sizes of prime numbers Q by using the method of 
multilevel coding [8|. The above example shows that when 
q is a power of a prime, without the decomposition of q-ary 
DMC, asymptotically better coding scheme can be constructed 
by using q-ary polar codes with Grs((?). 

V. Conclusion 

The channel polarization phenomenon on q-ary channels 
has been considered. We give several sufficient conditions 
on kernels under which the channel polarization phenomenon 
occurs. We also show an explicit construction with a q-ary 
linear kernel GRs(q) for q being a power of a prime. The 
exponent of GRs(q) is log c (q!)/(qlog c q) which is larger than 
the exponent of binary matrices of small size even if q = 4. 
Our discussion includes channel polarization on non-linear 
kernels as well. It is known that non-linear binary codes may 
have a larger minimum distance than linear binary codes, e.g. 
the Nordstrom-Robinson codes [9|. This implies possibility 
that there exists a non-linear kernel with a larger exponent 
than any linear kernel of the same size. 



Appendix 



Lemma 21: 



Proof: For the second inequality, one has 



"o — V o "o 

= <f E \fw^yt\< 1 1 x)w^{yl 1 y - 1 \ *o 

Vo 

~ „e-i-i E ( E 



Vo N "i+i >™ i+ i 



1 I 1 ...t-l\ 



E 



Vo 



i-i i-i e-i 
y v i+1 ,tv i+1 



The first inequality is obtained as follows. 



e-i 
Vo 



<£ ^W^iy 1 ,- 1 ,^ 1 | i)^^- 1 ^- 1 | x') 



e-i 
Vo 



E ( n 2u-i 



_-i \ f-i t-i 
Vo X+i ' w i+i 



(t-l-i) 



> 



E E „2(£-l 



- -1 t—1 £—1 Q 
Vo « <+ i 



1 - D^.iutr 1 ) 



Acknowledgment 

TT acknowledges support of the Grant-in-Aid for Scientific 
Research on Priority Areas (No. 18079010), MEXT, Japan. 

References 

[1] E. Ankan, "Channel polarization: A method for constructing capacity- 
achieving codes for symmetric binary-input memoryless channels," IEEE 
Trans. Inf. Theory, vol. 55, no. 7, pp. 3051-3073, July 2009. 

[2] S. Korada and R. Urbanke, "Polar codes are optimal for lossy source 
coding," 2009. [Online]. Available: http://arxiv.org/abs/0903.0307 

[3] E. Ankan and E. Telatar, "On the rate of channel polarization," in Proc. 
2009 IEEE International Symposium on Information Theory, June 28-July 
3 2009, pp. 1493-1495. 

[4] T. Tanaka and R. Mori, "Refined rate of channel polarization," 2010. 
[Online]. Available: http://arxiv.org/abs/1001.2067 

[5] S. Korada, E. §a§oglu, and R. Urbanke, "Polar codes: characterization 
of exponent, bounds, and constructions," 2009. [Online]. Available: 
http://arxiv.org/abs/0901.0536 

[6] E. §a§oglu, E. Telatar, and E. Arikan, "Polarization for 
arbitrary discrete memoryless channels," 2009. [Online]. Available: 
http://arxiv.org/abs/0908.0302 

[7] E. Sasoglu, E. Telatar, and E. Arikan, "Polarization for arbitrary discrete 
memoryless channels," in Proc. 2009 IEEE Information Theory Work- 
shop, Taormina, Italy, 1 1-16 Oct. 2009, pp. 144-148. 

[8] H. Imai and S. Hirakawa, "A new multilevel coding method using error- 
correcting codes," Information Theory, IEEE Transactions on, vol. 23, 
no. 3, pp. 371-377, may 1977. 

[9] F. MacWilliams and N. Sloane, The Theory of Error-Correcting Codes. 
North-Holland Amsterdam, 1977. 



